More accurate tests for the statistical significance of result differences
نویسنده
چکیده
Appears in the 18th International Conference on Computational Linguistics (COLING 2000), pages 947-953, Saarbrücken, Germany, July, 2000. cs.CL/0008005 Statistical significance testing of differences in values of metrics like recall, precision and balanced F-score is a necessary part of empirical natural language processing. Unfortunately, we find in a set of experiments that many commonly used tests often underestimate the significance and so are less likely to detect differences that exist between different techniques. This underestimation comes from an independence assumption that is often violated. We point out some useful tests that do not make this assumption, including computationally-intensive randomization tests.
منابع مشابه
Statistical and Practical Significance of Articles at Sports Biomechanics Conferences
Background. The importance of using statistical approaches has increased and became necessary for researchers and specialists in sports biomechanics because they need more objective and accurate methods to increase knowledge. Objectives. Evaluate the reality of using practical significance in the articles published in scientific conferences in the biomechanical sport. Methods. One hundred twe...
متن کاملروش معناداری بالینی تغییرات درمانی: ویژگیها، شیوه محاسبه، تصمیمگیری و محدودیتها
Statistical significant methods are usually used to evaluate the effectiveness of the psychological treatments however, these methods have a number of inherent limitations. In fact, significant differences between the pre-treatment and post-treatment scores of patients per se do not necessarily indicate that the patients have reached the level of successful performance in their real life. Thus,...
متن کاملFeasibility of using statistical tests in evaluation of non-uniformity [Persian]
Introduction: Non-uniformity test is essentially the only required daily QC procedure in nuclear medicine practice. Noise creates statistical variation or random error in a flood image. Non-uniformity on the other hand does not have statistical nature and may be regarded as systemic error. The present methods of non-uniformity calculation do not distinguish between these two types of erro...
متن کاملComparative Analysis of Dimensional Accuracy of Two Types of Silicone Impression Materials: Optosil and Elite-HD
Statement of the Problem: Understanding about impression materials, their properties, uses and manipulation can guide operators towards having more successful restorations. Purpose: The aim of this in vitro study was to compare the accuracy of a condensation silicone impression material, Optosil, and an addition poly siloxane impression material, Elite-HD. Materials and Method: A laboratory mod...
متن کاملارزیابی تغییرات قلب موش های نر نژاد ویستار در پی تمرینات استقامتی
Background: The aim of this study was accurate evaluation of cardiac responses to endurance training using two presented methods of m-mode and weigh in rats, and also evaluation of training induced changes in heart and left ventricular in the proporation of body weight, tibia length and body surface area. Materials and Methods: 20 male Wistar rats (209–231 g) were selected they were randomly di...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000